Many real-world applications of language models (LMs), such as code autocomplete and writing assistance, involve human-LM interaction, but the main LM benchmarks are non-interactive, where a system produces output without human intervention. To evaluate human-LM interaction, we develop a framework, Human-AI Language-based Interaction Evaluation (H-LINE), that expands non-interactive evaluation along three dimensions, capturing (i) the interactive process, not only the final output; (ii) the first-person subjective experience, not just a third-party assessment; and (iii) notions of preference beyond quality. We then design five tasks ranging from goal-oriented to open-ended to capture different forms of interaction. On four state-of-the-art LMs (three variants of OpenAI's GPT-3 and AI21's J1-Jumbo), we find that non-interactive performance does not always result in better human-LM interaction and that first-person and third-party metrics can diverge, suggesting the importance of examining the nuances of human-LM interaction.
translated by 谷歌翻译
We consider a model where a signal (discrete or continuous) is observed with an additive Gaussian noise process. The signal is issued from a linear combination of a finite but increasing number of translated features. The features are continuously parameterized by their location and depend on some scale parameter. First, we extend previous prediction results for off-the-grid estimators by taking into account here that the scale parameter may vary. The prediction bounds are analogous, but we improve the minimal distance between two consecutive features locations in order to achieve these bounds. Next, we propose a goodness-of-fit test for the model and give non-asymptotic upper bounds of the testing risk and of the minimax separation rate between two distinguishable signals. In particular, our test encompasses the signal detection framework. We deduce upper bounds on the minimal energy, expressed as the 2-norm of the linear coefficients, to successfully detect a signal in presence of noise. The general model considered in this paper is a non-linear extension of the classical high-dimensional regression model. It turns out that, in this framework, our upper bound on the minimax separation rate matches (up to a logarithmic factor) the lower bound on the minimax separation rate for signal detection in the high dimensional linear model associated to a fixed dictionary of features. We also propose a procedure to test whether the features of the observed signal belong to a given finite collection under the assumption that the linear coefficients may vary, but do not change to opposite signs under the null hypothesis. A non-asymptotic upper bound on the testing risk is given. We illustrate our results on the spikes deconvolution model with Gaussian features on the real line and with the Dirichlet kernel, frequently used in the compressed sensing literature, on the torus.
translated by 谷歌翻译
Summarizing novel chapters is a difficult task due to the input length and the fact that sentences that appear in the desired summaries draw content from multiple places throughout the chapter. We present a pipelined extractive-abstractive approach where the extractive step filters the content that is passed to the abstractive component. Extremely lengthy input also results in a highly skewed dataset towards negative instances for extractive summarization; we thus adopt a margin ranking loss for extraction to encourage separation between positive and negative examples. Our extraction component operates at the constituent level; our approach to this problem enriches the text with spinal tree information which provides syntactic context (in the form of constituents) to the extraction model. We show an improvement of 3.71 Rouge-1 points over best results reported in prior work on an existing novel chapter dataset.
translated by 谷歌翻译
本文解决了在无监督的2D至3D姿势提升过程中2D姿势​​表示的问题,以提高3D人姿势估计(HPE)模型的准确性,稳定性和普遍性。在训练期间,所有无监督的2d-3d HPE方法都为模型提供了整个2D运动骨架。我们认为,这是亚最佳和破坏性的,因为在训练过程中独立的2D关键点和预测的3D序列之间引起了远距离相关性。为此,我们进行了以下研究。我们的最大体系结构能力为6个残留块,我们评估了5个模型的性能,在对抗性无监督的2d-3d HPE过程中,每个模型的姿势都不同。此外,我们还显示了在训练过程中学习的2D关键点之间的相关性,并强调了当将整个2D姿势提供给起重模型时引起的不直觉相关性。我们的结果表明,2D姿势的最佳表示是两个独立的段落,即躯干和腿部,每个提升网络之间没有共同的特征。与在整个2D运动骨架上训练的几乎相同的参数计数相比,这种方法在人类36m数据集上的平均误差下降了20 \%。此外,由于对抗性学习的复杂性质,我们展示了这种表示如何在训练过程中改善收敛性,从而更频繁地获得最佳的结果。
translated by 谷歌翻译
大规模的社交网络被认为通过扩大人们的偏见来促进两极分化。但是,这些技术的复杂性使得难以确定负责的机制并评估缓解策略。在这里,我们在受控的实验室条件下显示,通过社交网络进行信息传输会扩大对简单的感知决策任务的动机偏见。大型行为实验的参与者表明,当社交网络相对于社会参与者的一部分,在40个独立发展的人群中,社交网络的一部分相对于社交参与者而言,有偏见的决策率提高。利用机器学习和贝叶斯统计的技术,我们确定了对内容选择算法的简单调整,该算法预测可减轻偏置放大。该算法从个人网络内部生成了一个观点样本,这些视角更代表整个人群。在第二个大型实验中,该策略减少了偏差放大,同时保持信息共享的好处。
translated by 谷歌翻译
我们提出了一个开放域的社交聊天机器人Chirpy Cardinal。为了既有信息又有信息,我们的机器人以一种真实的,情感上的方式与用户聊天。通过将受控的神经产生与脚手架,手写的对话整合在一起,我们让用户和机器人都轮流推动对话,从而产生引人入胜且流利的体验。Chirpy Cardinal部署在Alexa奖Socialbot Grand Challenge的第四次迭代中,每天处理数千次对话,在9个机器人中排名第二,平均用户评级为3.58/5。
translated by 谷歌翻译
我们考虑了一个通用的非线性模型,其中信号是未知(可能增加的,可能增加的特征数量)的有限混合物,该特征是由由真实非线性参数参数化的连续字典发出的。在连续或离散设置中使用高斯(可能相关)噪声观察信号。我们提出了一种网格优化方法,即一种不使用参数空间上任何离散化方案的方法来估计特征的非线性参数和混合物的线性参数。我们使用有关离网方法的几何形状的最新结果,在真实的基础非线性参数上给出最小的分离,以便可以构建插值证书函数。还使用尾部界限,用于高斯过程的上流,我们将预测误差限制为高概率。假设可以构建证书函数,我们的预测误差绑定到日志 - 因线性回归模型中LASSO预测器所达到的速率类似。我们还建立了收敛速率,以高概率量化线性和非线性参数的估计质量。
translated by 谷歌翻译
我们提出了Tipsy-Gan,这是一种提高无监督对抗2d至3D人类姿势估计的准确性和稳定性的新方法。在我们的工作中,我们证明了人运动骨骼不应被假定为单一的空间相互依存的结构。实际上,我们认为,当训练期间提供完整的2D姿势时,存在一种固有的偏见,在其中,关键点的3D坐标在空间上依赖于所有其他关键点的2D坐标。为了研究我们的假设,我们遵循以前的对抗方法,但在运动骨架,躯干和腿部的空间独立部分上训练两个发电机。我们发现,改善自抗性周期是降低评估误差的关键,因此在训练过程中引入了新的一致性约束。通过这些发电机的知识蒸馏产生尖端模型,该模型可以预测整个2D姿势的3D尺寸,并改善结果。此外,我们在先前的工作中解决了一个未解决的问题,即在一个真正无监督的情况下要训练多长时间。我们表明,对于两个独立的发电机,对手训练的稳定性比崩溃的独奏发电机的稳定性提高了。与人为36m数据集中的基线独奏器相比,Tipsy将平均误差降低了17 \%。 Tipsy对其他无监督的方法进行了改进,同时在对人类360万和MPI-INF-3DHP数据集的评估过程中也强烈反对受监督和弱监督的方法。
translated by 谷歌翻译